Software Fault Tolerance for Low-to-Moderate Radiation Environments
نویسندگان
چکیده
The primary intention of NASA’s Remote Exploration and Exploration (REE) project is to use commercial off-the-shelf, scalable, low-power, fault-tolerant, high-performance computation in space. Most of the faults caused by the radiation environments in regions of space of interest to REE (Deep Space, Low Earth Orbit) are transient, single event effects. Some of these faults can cause errors at different application levels. System and applications software can potentially detect and correct some or many of these errors. We discuss different software fault tolerance approaches such as replication, voting, and masking with a focus on algorithm-based fault-tolerance. Combined software and hardware approaches such as fault avoidance, redundancy, masking, and reconfiguration are discussed. These approaches allow trade-offs between reliability, power, cost, and computation power for spacecraft in a low-to-moderate radiation environment.
منابع مشابه
Effective Fault Tolerance for Robust Robotics under Radiation Exposure
The development of fault-tolerant autonomous robots for long-term deployment has been an area of active research for decades. Many researchers have focused on the tolerance to failures of sensors and actuators of various types of robots. In contrast to these failures, robots encounter transient faults on all levels including the bit level of microprocessors when operating in space or in nuclear...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملFDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks
Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...
متن کاملFault-tolerant Data Sharing for High- Level Grid Programming: a Hierarchical Storage Architecture
Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier softwa...
متن کاملDetailed Radiation Fault Modeling of the Remote Exploration and Experimentation ( W E ) First Generation Testbed Architecture
-The goal f the NASA HPCC Remote Exploration and Experimentation (REE) Project is to transfer commercial supercomputing technology into space. The project will use state of the art, low-power, nonradiation-hardened, commercial Off-The-Shelf (COTS) hardware chips and COTS software to the maximum extent possible, and will rely on Software-Implemented Fault Tolerance (SIFT) to provide the required...
متن کامل